1 SNR - INVARIANT PLDA 1 Lecture Notes on SNR - Invariant PLDA

نویسنده

  • Man-Wai MAK
چکیده

This document provides the derivations of the equations in the paper: Na Li and M.W. Mak, “SNR-Invariant PLDA Modeling in Nonparametric Subspace for Robust Speaker Verification”, IEEE/ACM Trans. on Audio Speech and Language Processing, vol. 23, no. 10, pp. 1648-1659, Oct. 2015. Please cite this document as: M.W. Mak, Lecture Notes on SNR-Invariant PLDA, Technical Report and Lecture Note Series, Department of Electronic and Information Engineering, The Hong Kong Polytechnic University, April 2015. 1 SNR-Invariant PLDA 1.1 Generative Model Denote xij as the j-th D-dimensional i-vector from speaker i, where the i-vector is obtained from an utterance with SNR falling into the k-th SNR group. Then, we have xij = m + Vhi + Uwk + k ij (1) where m is a D × 1 vector representing the global mean of i-vectors, hi is a P × 1 vector denoting the speaker factor with the prior distributionN (0, I), wk is a Q×1 vector denoting the latent SNR factor with prior distributionN (0, I), ij is a D×1 vector denoting the residual with distributionN (0,Σ), V is a D× P matrix whose columns span the speaker subspace, and U is a D ×Q matrix whose columns span the SNR subspace. 1.2 EM Formulation Denote all of the training i-vectors as X = {xij |i = 1, . . . , S; j = 1, . . . ,Hi(k); k = 1, . . . ,K}, where S is the number of training speakers, Hi(k) is the number of utterances from speaker i at the k-th SNR group, and K is the number of SNR groups. Eq. 1 can be written as: xij = m + [V U] [ hi wk ] + ij = m + Bẑik + k ij , where B = [V U] and ẑik = [hi w T k ] T. The model parameters θ = {m,V,U,Σ} are estimated by an EM algorithm, in which the loading matrices V and U can be estimated either separately or jointly. 1.2.1 Decoupled Estimation of Loading Matrices In this approach, we focus on one latent factor at a time and marginalize over the other latent factors. For example, to estimate V, we compute the posterior expectation of hi by marginalizing over wk. Thus, the posterior density of hi is written as: p(hi|xij ,θ) ∝ p(xij |hi,θ)p(hi) = ∫ p(xij ,wk|hi,θ)p(hi)dwk = ∫ p(xij |hi,wk,θ)p(wk)p(hi)dwk 1 SNR-INVARIANT PLDA 2 = ∫ N (xij |m + Vhi + Uwk,Σ)N (wk|0, I)N (hi|0, I)dwk = N (xij |m + Vhi,Φ)N (hi|0, I) ∝ exp { hi V Φ(xij −m)− 1 2 hi (I + V ΦV)hi } (2) where Φ = UUT + Σ. Comparing this posterior density with a standard Gaussian, we have 〈hi|xij〉 = ( I + VTΦ−1V )−1 VΦ(xij −m) 〈hihi |xij〉 = ( I + VTΦ−1V )−1 + 〈hi|xij〉〈hi|xij〉. (3) If all of the i-vectors of speaker i are given, we evaluate the joint posterior p(hi|xij ∀j and k,θ) ∝ K ∏

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

SNR-invariant PLDA modeling for robust speaker verification

In spite of the great success of the i-vector/PLDA framework, speaker verification in noisy environments remains a challenge. To compensate for the variability of i-vectors caused by different levels of background noise, this paper proposes a new framework, namely SNR-invariant PLDA, for robust speaker verification. By assuming that i-vectors extracted from utterances falling within a narrow SN...

متن کامل

Discriminative subspace modeling of SNR and duration variabilities for robust speaker verification

Although i-vectors together with probabilistic LDA (PLDA) have achieved a great success in speaker verification, how to suppress the undesirable effects caused by the variability in utterance length and background noise level is still a challenge. This paper aims to improve the robustness of i-vector based speaker verification systems by compensating for the utterance-length variability and noi...

متن کامل

Noise robust speaker verification via the fusion of SNR-independent and SNR-dependent PLDA

While i-vectors with probabilistic linear discriminant analysis (PLDA) can achieve state-of-the-art performance in speaker verification, the mismatch caused by acoustic noise remains a key factor affecting system performance. In this paper, a fusion system that combines a multi-condition SNR-independent PLDA model and a mixture of SNR-dependent PLDA models is proposed to make speaker verificati...

متن کامل

SNR-dependent mixture of PLDA for noise robust speaker verification

This paper proposes a mixture of SNR-dependent PLDA models to provide a wider coverage on the i-vector spaces so that the resulting i-vector/PLDA system can handle test utterances with a wide range of SNR. To maximise the coordination among the PLDA models, they are trained simultaneously via an EM algorithm using utterances contaminated with noise at various levels. The contribution of a train...

متن کامل

Dataset-invariant covariance normalization for out-domain PLDA speaker verification

In this paper we introduce a novel domain-invariant covariance normalization (DICN) technique to relocate both in-domain and out-domain i-vectors into a third dataset-invariant space, providing an improvement for out-domain PLDA speaker verification with a very small number of unlabelled in-domain adaptation i-vectors. By capturing the dataset variance from a global mean using both development ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016